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Abstract: In the field of speech signal processing, 
Spectral subtraction method (SSM) has been 
successfully implemented to suppress the noise that is 
added acoustically. SSM does reduce the noise at 
satisfactory level but musical noise is a major 
drawback of this method. To implement spectral 
subtraction method, transformation of speech signal 
from time domain to frequency domain is required. On 
the other hand, Wavelet transform displays another 
aspect of speech signal. In this paper we have applied 
a new approach in which SSM is cascaded with 
wavelet thresholding technique (WTT) for improving 
the quality of speech signal by removing the problem 
of musical noise to a great extent. Results of this 
proposed system have been simulated on MAT LAB. 

Keywords: Coefficient Thresholding, Musical Noise, 
SSM, Wavelet Coefficients, WTT. 

I. INTRODUCTION 

The musical noise produced by SSM is a major 
drawback of this system, but there are so many 
methods that have been given for musical noise 
reduction. This paper proposed a new technique in 
which SSM is cascaded with WTT for musical noise 
reduction. 

SSM requires a transformation of signal from time 
domain to frequency domain using FFT. In this 
method, a voice activity detector [1] is used for 
detecting the signal whether it is voiced signal or 
unvoiced signal. This method is based on the direct 
estimation of the short term spectral magnitude of 
speech signal during non-speech activity. Spectral 
subtraction method is successful in stationary or 
slowly varying noisy environment, otherwise the 
estimated noise is not correct and system generates 
musical noise [10]. On the other hand, if we transform 
a signal into wavelet domain it simply breaks the 
signal into low frequency and high frequency 
components with the help of low pass filter and high 
pass filter that yields the coefficients. In this method, a 
thresholding technique is used for signal de-noising 
that discards the coefficients below threshold level. 



WTT [7] has been successfully used for image de- 
noising but a very less attention has been paid for 
practical implementation of this technique in the field 
of speech signal. WTT can de-noise [2] a signal 
without noticeable loss because it reveals the aspects 
like trends, breakdown points, discontinuities in higher 
derivatives. In this paper we have cascaded [8] WTT 
with spectral subtraction method because both 
techniques use different approach for signal de- 
noising. First we applied SSM and then the output of 
SSM is given as input in WTT for better results. This 
new method will be very effective for military 
applications, real time noisy environments. 

II. SPECTRAL SUBTRACTION METHOD (SSM) 

A. Introduction 

SSM is very popular and useful for acoustic noise 
suppression because of its relative simplicity and ease 
of implementation. This method is used for restoration 
of power spectrum or magnitude spectrum of a speech 
signal contains additive noise. In this method, a noise 
is added acoustically or digitally into the original 
speech signal and it becomes noisy speech signal. 
Then we take an estimation of the noise spectrum that 
updated from the periods during non-speech activity 
when only noise is present. The estimation of noise 
spectrum is subtracted from noisy signal and then we 
get an estimate of the clean reconstructed signal. 
Generally, spectral subtraction is effective for 
stationary or slowly varying noisy environments. 

B. Mathematical Approach 

Suppose speech signal x(m) is corrupted by noise 
n(m) that yields noisy signal 

Y(m) = x(m) + n(m) ... (1) 

When windowing the signal 

Y w (m) = x w (m) +n w (m) ... (2) 

Fourier transform of equation (2) is as under 

Y w (en = X w (e jw ) + N w (en ... (3) 
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Where r w (V w ), X w (V w )and N w (e jw ) are the 
Fourier transforms of noisy speech, original speech, 
and noise signals respectively. 

For simplification purpose w (windowed) notation 
is dropped. 

When multiplying both sides by their complex 
conjugates, we find 

[Y(en? = 

[X(e jw )] 2 + [N(e jw )] 2 + 2[X(e jw )][N(eJ w )]cosD q ... 

(4), 

Where, D q stands for phase difference between 
speech signal and noise signal. 

D q =^x(e jco )-^N(e jco ) (5) 



We take expected value on both sides of equation (4) 

E{[Y(e jw )] 2 } = E{[X(e jw )] 2 } + E{[N(e jw )] 2 } 

+ 2E{[X(en]}E{[N(en]}E{cos(D q )} 



(6) 



1 . Power spectral subtraction: 

For power spectral subtraction it is assumed that 
<• q > ~ , hence equation (6) becomes 
E{[Y(e jw )] 2 } = E{[X(e jw )] 2 } + E{[N(e jw )] 2 } 
So, [X(e jw )] 2 = [Y(e jw )] 2 - E{[N(e jw )] 2 .... (7) 

2. Magnitude spectral subtraction: 

For magnitude spectral subtraction it is assumed 
that e {cos D } = 1 > hence equation (6) becomes 

E{[Y(eJ w )] 2 } = E{[X(eJ w )] 2 } + E{[N(eJ w )] 2 } 
+ 2E{[X(en]}E{[N(en]} 

E{[Y(en]} = E{[X(en]} + £{[W(^' W )]} 
[X(en] = [Y(en]-E{[N(en]} (8) 

The procedure of spectral subtraction method 
has been shown below in figure 1 . 
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Figure 1: Basic blocks of spectral subtraction method 



III. WAVELET THRESHOLDING TECHNIQUE 
(WTT) 

A. Introduction 

SSM is effective for stationary or slowly varying 
noises, but in mobile communication, signal is 
definitely not stationary. So the next possible 
improvement in speech signal is to further decrease the 
problem of musical noise using WTT. In wavelet 
transform the output speech signal x(m) of spectral 
subtraction method has been taken as an input signal 
and that signal is divided up into low frequency and 
high frequency components. The output of LPF is 
known as approximation coefficients and the output of 
HPF is called detail coefficients. When we analyze 
approximation coefficients [9] at level 1 by using 
MATLAB command sound (cAl, Fs, bit depth) we 
can understand the speech with a low loss in the 
quality of signal. This shows that low frequency 
components contain essential information and that is 
why the output of LPF is called approximation 
coefficient. The output of HPF contains only high 
frequency non-essential information and is known as 
detail coefficient. For applying wavelet technique first 
we have to choose an appropriate mother wavelet and 
level of decomposition of the signal. Choosing a 
mother wavelet depends on the type of the signal we 
have to decompose. While speech de-noising our 
objective is to improve quality of the signal, so 
wavelet can be selected on the basis of energy 
conservation properties in the approximation 
coefficients [7]. By using Daubechies D20, D6, D4, 
D2 or Haarwavelets, more than 90% of the signal 
energy, level 1 approximation coefficients contains. 
For selecting a decomposition level, if the frame based 
input is applied, then frame size must be a multiple 
of 2 n , where n represents the decomposition level. In 
this paper, we have selected 'Daubechies' as a mother 
wavelet and decomposition level is 6. 

B. Wavelet approach for musical noise reduction 

Wavelet thresholding technique is very useful and a 
different technique for residual noise reduction. 
Residual noise come into existence because of 
variation in background noise, and that is why residual 
noise occurs during whole speech (including speech 
activity as well as non- speech activity). Using wavelet 
thresholding technique we are exploiting the fact that 
residual noise contains narrower peaks which are 
relatively high frequency components. More than 90% 
components of speech signal have values zero or near 
to zero that is clear from histogram representation. 
Here a threshold value is selected and all the 
coefficients are truncated that have values lower than 
threshold, so wavelet thresholding technique removes 
residual noise (also called musical noise in time 
domain) successfully to the great extent. 
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Histogram representation 
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Figure!: Histogram representation 
IV. THRESHOLDING OF COEFFICIENTS 

After applying wavelet transform, input signal is 
decomposed into coefficients. Then we perform 
thresholding of coefficients for signal de-noising 
which is of two types, hard thresholding and soft 
thresholding. Generally hard thresholding is used for 
signal compression and soft thresholding is used for 
signal de-noising. Here we have used soft thresholding 
for de-noising the signals. Soft thresholding is an 
expansion of hard thresholding in which we first set to 
zero the elements whose absolute values are lesser 
than the threshold and then shrink the nonzero 
coefficients toward 0. After choosing soft 
thresholding, there are two types for finding a 
threshold value named global thresholding and level 
dependent thresholding. In global thresholding, a 
threshold value is set manually. For level dependent 
thresholding, we use Brige-Massart strategy [7] that 
yields a different threshold values for each level. To 
de-noise a signal we use a MATLAB command 
wdencmp that enables us to choose between global and 
level dependent thresholding. Coefficient thresholding 
discards the coefficient that has a value below the 
threshold and it results de-noised signal. In wavelet de- 
noising method we have taken x(m) as an input signal 
that is output signal of SSM. Steps involved in wavelet 
de-noising process are shown in figure 3. 



x(m) 



Select a 
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Wavelet de- 
composition 



Thresholding 
& truncation 
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I 
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Figure3: Wavelet de-noising process 



V. PERFORMANCE ANALYSIS OF PROPOSED 
SYSTEM 

Performance analysis of this proposed system has 
been done in terms of Peak signal to noise ratio 
(PSNR) and Normalized root mean square error 

(NRMSE). 



PSNR has been evaluated using 



PSNR = 10log 1Q - 



NX Z 



_ r ||2 



Where, N is the length of the reconstructed signal, 
X is the maximum absolute square value of signal 
x. ||x — r|| 2 is the energy of the difference between 
original and reconstructed signal. 

And NRMSE has been evaluated using 



NRMSE ■ 



(x(n) —r(n)) 2 
(x(n) — \ix (n) 2 



Where, x(n) is the speech signal, r(n) is the 
reconstructed signal and /ix(n) is the mean of the 
speech signal. 

For better results PSNR should be higher while 
value of NRMSE should be as low as possible. 

We have taken a male spoken speech signal of 5 sec 
with 8 KHz sampling frequency and bit depth is 16, 
shown in figure4 
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Figure 4: Original speech signal 
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After digitally added random noise in original speech signal, 
the noisy speech signal is shown in figure 5 
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Figure5: Noisy signal 
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We applied SSM for signal de-noising and got 
reconstructed signal shown in figure 6. 



R e c o n st ru ct e d s i g n a I 




Figure 6: Output de-noised signal of spectral subtraction 

After getting the output de -noised signal using 
SSM, we used command sound (reconstructed signal, 
Fs, bit depth) to hear the de-noised signal and got a 
great improvement in the quality of signal (PSNR and 
NRMSE ofx(ra) using SSM is 13.4981dB and 
1.0818) but a little bit presence of noise still we can 
feel that is identified by musical noise. So we have 
used a new technique for reducing musical noise in 
which the reconstructed signal using SSM is taken as 
input signal for WTT. After transforming this signal 
into wavelet coefficients and applying thresholding 
respectively we got an output signal with reduced 
musical noise. This final output signal with reduced 
musical noise is shown in figure 7. 
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Figure 7: Signal of reduced musical noise using haar 
wavelet 

Table (a) 



Wavelet 
type 


Decomp 

-osition 

level 


Percentage 

Retained 

energy 


PSNR in 
dB 


NRMS 
E 


Haar 


6 


83.7857 


14.4836 


1.0298 


Db2 


6 


86.5747 


14.3677 


1.0357 


Db4 


6 


87.9903 


14.2931 


1.0396 


Db6 


6 


88.5790 


14.2650 


1.0411 



PSNR using SSM is 13.4981dB, and NRMSE using 
SSM is 1.0818 and the PSNR and NRMSE values 
given in table (a) have been observed using proposed 
new system (SSM+WTT). So it's clear from PSNR 
and NRMSE values that there is a significant 
improvement in the speech signal by cascading SSM 
withWTT. 




Figure 8: Performance evaluation based on PSNR 
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Figure 9: Performance evaluation based on NRMSE 

VI. CONCLUSION AND FUTURE SCOPE 

Musical noise is a problem of spectral subtraction 
method that has been eliminated using wavelet 
thresholding technique (WTT). In this paper we have 
proposed a new system (SSM+WTT) which combined 
SSM and WTT respectively and the efficiency of the 
proposed system is higher as compared to SSM. Result 
of this combined system is clear from the waveform 
shown in figure 7 and differences between PSNR and 
NRMSE values. Table (a) represents the type of 
mother wavelet, decomposition level, percent retained 
signal energy in de -noised signal, peak signal to noise 
ratio (PSNR) and NRMSE. Haar wavelet has highest 
PSNR and lowest NRMSE values. Results have been 
simulated on MATLAB. 

In future, if we use Wavelet Packet Transform 
instead of Wavelet transform with adaptive 
thresholding technique, the quality of reconstructed 
speech signal will be better. 
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